Project Proposal

Group #6

Published

December 2, 2024

Team members

  • Zhuoyan Guo
  • Xue Qin
  • Congfei Yin

Predicting Aids Patient Survival with Different Treatment

Background

HIV, or human immunodeficiency virus, is a virus that attacks the immune system by targeting white blood cells, making the body more vulnerable to illnesses like tuberculosis, infections, and certain cancers. Without treatment, HIV can progress over time to its most severe stage, AIDS (acquired immunodeficiency syndrome). The virus is transmitted through bodily fluids such as blood, breast milk, semen, and vaginal fluids, and can also be passed from mother to child.1

The symptom and signs are as follows:

  • fever

  • headache

  • rash

  • sore throat

HIV-Symptom

HIV-Symptom

Data sources

  • Data URL

  • It’s a public dataset contains records of patients who were diagnosis with AIDS.

  • The data contained 2139 intances, include integer and catugoriacal variables

Proposal

The goal of this project is to predict Aids patient treatment failure (death) or survival (censoring) with a certain window of time by comparing two different treatments and looking at the clinical factors such as patient demographics, treatment indicators, and clinical factors such as CD4 counts and Karnofsky scores.

Methodology that will be included in this project would be survival analysis and machine learning methods. Kaplan-Meier Estimator and survival curve will help to compare the difference of using two treatments by visualizing probability of survival over time. The prediction models such as Linear Regression, Logistic Regression and Decision Tree will be used to predict the patients’ survival outcome. The evaluation of models will also proceed during the model development. While constructing the models, the project will involve identifying the most significant features that contribute to patient survival and evaluate their influence.

Ultimately, this project will not only provide a reliable AIDS treatment outcomes prediction model but will also create clinical insights into how various treatments and various patients’ characteristics influence survival rates.

Questions to address

  1. How do patient demographics (e.g., age, gender, baseline health status) influence the effectiveness of different treatment methods on survival outcomes?
  2. What clinical features (e.g., CD4 counts, Karnofsky score) significantly impact survival outcomes for patients undergoing different treatments?
  • Which combination of CD4 counts and Karnofsky scores provides the most accurate prediction of patient outcomes for each treatment method?
  1. Which treatment method would be better?
  • How do patient demographics (e.g., age, gender, baseline health status) influence the effectiveness of different treatment methods on survival outcomes?
  1. What are the most important indicators/features that better give the diagnois result?
  • What are the most significant clinical factors influencing patient survival when considering CD cell counts and treatment methods?
  1. what is the best model to predict the diagnosis result?

Project planned timeline

  • Proposal Outline - Sep 30th

  • Data Understanding, Cleaning and EDA - Oct 6th

  • Literature Review - Oct 20th

  • Methodology and Code Development - Nov 3th

  • Presentation Preparation - Nov 24

  • Report Write Up - Dec 1

  • 3 Teammates will work together and hold bi-weely meetings discussing the project progress.